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Abstract. We prove a rigidity theorem for the geometry of the unit ball in 
random subspaces of the scl norm in of a free group. In a free group 
F of rank k, a random word w of length n (conditioned to lie in [F,F]) has 
scl(u;) = log(2A: — l)n/61og(n) + o(n/log(n)) with high probability, and the 
unit ball in a subspace spanned by d random words of length 0{n) is close 
to a (suitably afBnely scaled) octahedron. 

A conjectural generalization to hyperbolic groups and manifolds (discussed 
in the appendix) would show that the length of a random geodesic in a hyper- 
bolic manifold can be recovered from the bounded cohomology of the funda- 
mental group. 



1. Introduction 

Mostow's Rigidity Theorem says that a homotopy equivalence between closed 
hyperbolic manifolds of dimension at least three is homotopic to an isometry. It 
follows that geometric invariants of a hyperbolic manifold have (at least in principle) 
a purely topological definition. This is most apparent in Gromov's famous proof [50] 
of the Rigidity Theorem which proceeds by showing that an obviously topological 
invariant — namely the Gromov (or L^) norm of the fundamental class in homology 
— is proportional to the volume in any hyperbolic metric. As observed by Thurston 
[31j a similar argument shows that for any locally symmetric space M modeled on 
a symmetric space X there is a constant C{X) so that the norm of the fundamental 
class II [M] 111 satisfies 

||[M]||i = C(A)-vol(M) 
However, the determination of the constant C(A) in any given case is extremely 
difficult. Haagerup and Munkholm (35] showed for X equal to hyperbolic n-space 
H" that C(]HI") — 1/vn where Vn is the volume of the regular ideal hyperbolic n- 
simplex, and Bucher-Karlsson [4] showed that C(H^ x H^) = 3/27r^. The proofs are 
very hard, and underscore the difficulty of computing the exact values of (nonzero) 
Gromov norms. 

In this paper we prove a new kind of rigidity theorem for the 2-dimensional rela- 
tive Gromov norm (or what is the same thing, the stable commutator length norm) 
in a free group F. This is a norm on a vector space B^{F), the homogenization 
of the space Bi of real group 1-boundaries (in the bar complex). The space 
is infinite dimensional, but its geometry can be probed by restricting attention to 
finite dimensional subspaces. Our main theorem is a rigidity result for the geometry 
of the unit ball in random finite dimensional subspaces of B^ (technically: in sub- 
spaces spanned by random elements of fixed length) . We show that these unit balls 
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are (suitably scaled) C° close to octahedra (i.e. the unit ball in M.'' with its usual 
norm). We also determine the exact scaling constant, and show that it has a 
simple expression in terms of the growth exponent of the free group (i.e. the entropy 
of the Markov process that generates random reduced words). We concentrate in 
this paper on the case of free groups for clarity of exposition, but similar results 
should hold for random words in arbitrary hyperbolic groups, or random geodesies 
in negatively curved manifolds, with an analogous formula for the scaling constant. 
We explain the idea of this generalization in an appendix, but save the details for 
a follow-up paper. 

Recall that stable commutator length is an algebraic stabilization of the topolog- 
ical notion of filling genus. If X is a space, and T : ^ X is a. homologically 
trivial 1-manifold, the filling genus of F is the least genus of a surface S mapping 
to X whose boundary represents the homotopy class of F. The stable commutator 
length scl(F) is the infimum of —x{S)/2n over all n and all surfaces S mapping to 
X whose boundary represents a cover F of F of degree n. If G is a group and X is 
a space with tti{X) = G, loops in X correspond to conjugacy classes in G, and the 
geometric definition given above defines in a natural way a pseudo-norm on i?i(G), 
the space of (real) 1-boundaries; i.e. finite formal real linear combinations of ele- 
ments in G representing in (real) homology. For G a hyperbolic group, scl descends 
to a norm on a suitable homogenized quotient (G) Bi/{g — hgh~^,g" — ng). 
Precise definitions are given in § [3] 

Our first main theorem concerns the stable commutator length of a random 
element of [F, F] of prescribed length n (we assume without comment that n is 
even, since a reduced element of odd length is never in [F, F] ) . Here "random" 
means with respect to the uniform probability on the finite set of reduced words 
of length n in [i^, i^] (when n is even). For clarity, we frequently use the standard 
Landau "big O/little o" notation, so the expression 0{g{x)) denotes some function 
f{x) satisfying f{x) < C\g{x)\ for some positive constant G and for all a; ^ 0, the 
expression Q(g{x)) denotes some function f{x) satisfying Cig{x) < f{x) < C2g{x) 
for some positive constants Gi, G2 and for all x ^ 0, the expression o{g{x)) denotes 
some function f{x) satisfying limx_^oo f{x)/g{x) = 0, and so on. See e.g. [23] for a 
reference. 

Random Rigidity Theorem I4.1i Let F be a free group of rank k, and let v be a 

random reduced element of length n, conditioned to lie in the commutator subgroup 
[F, F] . Then for any e > and C > 1, 

\scl{v) \og{n)/n ~ log(2fc - 1)/6| < e 

with probability 1 — 0{n^^). 

In particular, this implies that scl(u) \og{n)/n converges in probability to log(2fc— 
l)/6 as n — )■ 00. 

In more geometric language, we derive strong control on the geometry of the 
unit ball in the scl norm in a random subspace. 

Random Norm Theorem 14.161 Let F be a free group of rank k, and for fixed d, 
let Vi,V2, ■■ ■ ,Vd be independent random reduced elements of length ni, 712, ■ ■ • , 
conditioned to lie in [F^F], where without loss of generality we assume ni > Ui for 
all i. Let V be the subspace of (F) spanned by the Vi. Then for any e > 0, G > 1 
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and real numbers ti, 

\sc\(^Uvi)\og{ni)/ni - log(2fc - \U\ni)/&ni\ < e 

with probability 1 — 0{n^'~^). 

In words: the unit ball in the scl norm scaled by ni/ log(ni) converges to the unit 
ball in the norm || J2^i^i\\ = X] l^il'^i/'^i in the C° topology and in probability, as 
ni — oo. If Ui — ui + o{ni) for all i, the unit ball is close to a (scaled) 
octahedron. 

It is worth remarking that the speed of convergence is very slow. Our asymp- 
totic theorems depend on the distribution of the subwords of a random word at a 
particular characteristic scale: for a word of length n, we focus on the subwords 
of length 0(log(n)). There are some "boundary effects" which suggest a heuristic 
correction to our asymptotic formula which becomes insignificant only when log(n) 
is sufficiently large. Computer experiments (described in § 15]) show this heuristic 
correction to be in very good agreement with reality. However we are not able to 
rigorously justify this observation nor obtain a precise asymptotic estimate of the 
error. 

1.1. Acknowledgments. We would like to thank Jeremy Kahn and Richard Sharp 
for some useful conversations about this material. We would also like to thank the 
anonymous referee for helpful comments and suggestions. Danny Calegari was 
supported by NSF grant DMS 1005246. 

2. The random reduced word 

2.1. Reduced words. Fix a free group F of rank k and a free generating set. The 
generators will be denoted a, b, c and so on, and their inverses by A, B, C. 

We are interested in random reduced words conditioned to lie in the commutator 
subgroup. This is a complicated (non-local) condition to impose on a word. For- 
tunately, there is a nice estimate, due to Sharp, of the relative proportion of words 
of length n in [F, F] . 

Theorem 2.1 (Sharp [29], Thm. 1). Let F be a free group of rank k > 2. Let F„ 
denote the set of elements of F of length n, and let F^^ — F„ D [F, F]. If n is odd, 
F^ is empty, whereas there is an explicit constant a depending on k so that 

\FL\ 2 



lim 

>oo, n even 



\Fn\ (2^)'=/2 

where the limit is taken over even positive integers n. 







This theorem has the following consequence. Suppose that a random element of 
Fn has some property P with probability 1 — o(n~'^/^). Then a random element 
of F^ has property P with probability 1 — o(l). In practice, we are interested 
in properties of random elements in F„ that hold with probability 1 — 0(C^" ) 
for some constants C > l,c > 0, or with probability 1 — 0{n~'^) for all C > 0, 
and Sharp's theorem is the fundamental tool that lets us draw conclusions about 
random elements of -F,' . 

In the sequel we use the following notation consistently, where possible. We 
let V denote a random reduced word of length n, and let m(n, k) (or just m for 
brevity) be defined by m(n, k) := log(n)/ log(2fc— 1). There is a stationary Markov 
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process which produces random reduced words in F with the uniform probabihty, 
and log(2fc — 1) is the entropy of this process. 

2.2. Phase transition. The constant m = log(rt)/ log(2fc — 1) is a natural length 
scale on which to view subwords of a random word of length n. A random word of 
length 100 like 

bbbbaBAbAABaBaabbabbaBAABBAABBAbabAAbbABBBAbaaaaBAAbbABaBabaBaBAbAABBBBaBabbaaBAAABaBabAbABaaaabbbAA 

does not look homogeneous to the naked eye; the long strings of capital letters leap 
out and draw the reader's attention to specific locations in the word. The meaning 
of the scale m is that a random word of length n (for sufficiently large n) looks 
homogeneous on scales smaller than m, and heterogeneous on scales larger than to. 
However for this phase transition to become truly apparent, one must take n very 
large, so that to ~ log(n) 1. 

One way to quantify this distinction is to fix a length £ and compute some 
statistic associated to the set of subwords of v of length £. Each subword is an 
element of Fi (the set of elements of F of length £), and a natural number to count 
is 

Ai{v) :— — I copies of w in w — copies of 'w~^ in v \ 

If V is cyclically reduced, and one counts copies in the cyclic word v, then Ai{v) e 
[0, 1] with Ae{v) = 1 if and only if no inverse pair of subwords of length £ appear. 
There is a phase transition in Ag: for £ — Lm for some fixed L < 1 we have 
Ai{v) — ?> in probability, whereas for £ ~ Lm for some fixed L > 1 we have 
Ai{v) 1. This is proved in § [^:iH^ 

For words of length n = 10000 in rank k = 2 we have log(n)/ log(2fc — 1) w 
8.383613. We compute Ai{v) for a random word v in [F,F] of length 10000 for 
1 < £ < 11 (there are 236196 reduced words of length 11). This data is presented 
in Figure [H Note that conditioning v to lie in [F,F] forces Ai{v) = 0. The figure 
hints at a phase transition at £ ~ to but for it to be really sharp, one would need 
to take something like n ^ googol. 

1 -| 

Ae{v) 

\ 1 i ^ 1 ^ ^ ^ ^ ^ , e 

1 6 ^ 11 

Figure 1. Values of Ai{v) for v a random word of length 10000 
and 1 < £ < 12. 

2.3. Counting functions and counting measures. We use the notation Fi, 
F<i, F>i and so on for the set of elements in F of length i, < i, > i respectively. A 
random word of length n is an element of Fn , chosen with the uniform probability 
measure. Note that the cardinality of F„ is (2fc)(2fc — 1)"^^, so |Fi„i| ^ n^. 

Although it does not add much technically, we think of as a measure space, 
with the Borel algebra consisting of all subsets. Consequently any function f on F 
is measurable, and a function / is in L^{F) if and only if X^ggF < o^- 




RANDOM RIGIDITY IN THE FREE GROUP 



5 



Definition 2.2. For a reduced word ct, the counting function Ca is defined by 

Ccr (u) = number of copies of cr in t; 

and tlie counting measure C{v) is the measure on F of total mass \v\{\v\ — l)/2 for 
which C{v){a) = C„(w). 

For / a measurable function on F, define 

Cf{v)= jjdC{v) 
and define Hf{v) := Cf{v) — Cf{v~^). 

2.4. Accurately estimating Ca{v). If f is a random word of length n, and a 
is a random word of length Lm where L < 1, we need to estimate Co-(f). Since 
V contains n — \a\ + 1 subwords of length |c7|, the "expected" number of copies 
of cr in is (n - \a\ + = n^-^{2k)/{2k - 1) ± 0(log(n)). If subwords 

were independent, one would expect the deviation from this expected value to be 
typically of order n^^~^'/^, and to be of order n'^+i^~^y^ only with exponentially 
vanishing probability. This is what we prove: 

Proposition 2.3. Let L < \. Then for any e > there are constants C > 1 and 

c> Q so that 

P - n/\FL^\\ < n^+(i-^)/2 for all a G Fl^) = 1 - 0(C^"') 

Proof. The strategy is as follows. We first show that for each fixed word tr of length 
Lm the inequality P {\Ca{v) - n/\F\„\\\ < n^+^^'^y^) = 1 - 0(C-"°) holds. Since 
there are only O(n^) < n words of length Lm, it will follow that the desired 
estimate will hold for every a € i^Lm with probability 1 — 0{nC^"' ). Absorbing 
the n factor into the constants C and c, we will be done. 

Choose some constant N (we will decide on the exact value of N later). For each 
residue class j mod Nm, let Vj^i be the subword of v of length Lm which starts at 
the j + iNmth letter of v. The point is that for fixed j, the vj^i for consecutive i 
are "almost" independent. This is made precise in the following lemma: 

Lemma 2.4. For any two words x, y of length \a\, there is an inequality 

|P {v,, = X I =y)- 1/\F\,\\\ < {2k - 2)-(^-i)™ 

Proof. Let yuz be the subword of v starting at y, where u has length [N — l)m. 
The number of words u of fixed length for which yuz is reduced depends only on 
the length of u, the last letter of y, and the first letter of z. For any single letters 
a, h we let Umia, b) denote the number of reduced words of the form aub of length 
m + 2. We show by induction on m that the following two statements are true: 

(1) Umia, b) = Um{(i, c) if neither of b, c are equal to a"^""^^ 

(2) \um{a, a-i"^')/w„(a, &) - 1| < {2k - 2)"'" 

Since Wi(a, b) = {2k — 2) if 6 ^ a and ui{a, a) = {2k — 1) this is true for m = 1. 

Assume it is true for (m — 1) odd (for example). Then depending on the first 
letter of u we have two cases (by the induction step), and we deduce 

Um{a,b) = «„_i(6, 6) + {2k - 2)w,„_i(c, 6) for by^ A, c^^b 

Urn{a,A) = {2k - l)Urn-l{c,A) fOT C =^ A 

and the induction step is proved. The case (m — 1) even is analogous. The lemma 
follows. □ 
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We resume the proof of Proposition 12.31 By Lemma 12.41 the probability that 
VjA ~ cr conditioned on the value of is very nearly independent of the value 

of Vj^i-i, so we can compare the number of as among the Vj,i (for fixed j) with a 
sum of independent Bernoulli variables, and estimate the deviation from the mean 
using the Chernoff bound. Let Co-.j(u) be the number of copies of a among the Vj^i. 

Lemma 2.5. Suppose N > 3. For each j , and for any positive e, there is an 
inequality 

p(|aj(z;)-n/(iV™.|F|,||)| >n^+(i-^)/2^ = 0(C-"^) 

Proof. By Lemma [T4l the conditional probability that successive Vj^i are equal to a 
is never more than 1/|F|^| | + (2/c-2)-(^-i)™, or less than |-(2/c-2)-(^-i)™. 
So we can bound the probability of a large deviation in terms of such large deviations 
for sums of independent Bernoulli trials. 

Since (2/fc-2)-(^-i)" < n-0-6(^-i) (using the estimate log(2A:-2)/ log(2fc-l) > 
0.6 for k > 2), when TV > 3 we have {2k - 2)-(^-i)'" < 

We have the Chernoff bound (e.g. the upper bound in Thm. 1.3.13 from [30] ) 

^{\Sn - np\ > Sup) < e-'''"f/3 

where Sn is a sum of n independent Bernoulli random variables with parameter p. 
Using p+ = n-^{2k - l)/(2fc) + n-°-S(^~i) < n'^, we obtain 

P{Ca,jiv) - n/{Nm ■ \F\^\\) - n^-°-^^^-^'> /Nm > 5np+/Nm) < g-^'^P+Zs^™ 

Since Nm = 0(log(n)) and ?ii-0-6(^-i)/iVm < 1, taking S = n^-^^-^^/^Nm this 
implies 

PiC^jiv) - n/{Nm ■ |i^|,||) > n'+^^-^^/^) < 0{C~"^) 
where C > 1, c > depend only on e. 

A similar inequality holds for n/{Nm ■ \F\„\\) ~ Caj{v). □ 

We now complete the proof of Proposition 12.31 Since j was arbitrary, it fol- 
lows that every Ca,j{v) deviates from n/{Nm ■ |F|cr||) by at most with 
probability at least 1 - Nm ■ OiC-"-") which is stiU 1 - 0(C""'). Hence Ca{v) = 
T,j C'tj(w) deviates from n/\F^^^ \ by at most Nm ■ n"+(i-i)/2 < „e'+(i-L)/2 ^^^^^ 
the same probability. The proposition follows. □ 

In Appendix |^ we compare this result with Chernoff-type inequalities for non- 
reversible Markov chains obtained by Lezaud, Dinwoodie and others, and interpret 
such bounds in terms of the Cheeger constants of certain directed graphs. 

2.5. Bounding Caiv^^). We now turn our attention to words of length > m. 
Fix some L > 1, and let S be the set of subwords of v of length Lm. 

Proposition 2.6. For any e there are constants C > 1 and c > so that 

P ( 5] a(w-i) < n^-^+M = 1 - 0(C-"^) 
Veres / 

In particular, for e < L — 1, with probability 1 — 0{C^^ ) there is a subset S' of S 
with 

card{S - S') < n^-^+' = o(n/ log(n)) 
so that no element a E S' appears in v^^ . 
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Remark 2.7. Note that we think of S just as a set, not a set with muhiphcity. For 
apphcations, it will be important to show that the cardinality of S is close to n 
with probability 1 — 0{C~'^ ); we show this as Proposition 12. Ill 

Remark 2.8. The set of words of length Lm has cardinality of order n^, so the 
subset S has measure of order n}^^ . If we fix in advance any subset S of F^rn of 
measure , a robust ChernofF-type bound for Markov chains due to Lezaud (see 
Appendix lA)) gives a bound on X^o-es However this estimate cannot be 

applied naively to our context, since S depends (very strongly) on v. 

Proof. It is awkward to find a purely probabilistic proof of this estimate, because 
overlapping subwords of v are necessarily very highly correlated. The non-proba- 
bilistic ingredient in our proof is the following simple, but important observation: 

Lemma 2.9. Let v be a reduced word. Then for any reduced word a, no copy of a 
in V can overlap a copy of . 

Proof. If (T overlaps cr^^, then without loss of generality we can write a as xy where 
y = y^^ . But this is absurd. □ 

Now, for each j, let Vi be the subword of v of length Lm starting at the zth 
letter, and let w<i and u>i denote the part of v outside Vi, so that v = v^iViV^i as 
a reduced word. Further, let 5<i (resp. Syi) denote the subset of S consisting of 
subwords of length Lm in u<i (resp. w>i). By Lemma |2.9[ 

The point is that we can bound X)cres ^'^(''^7^) probability conditioned on w<i, 
independently of v^i. 

Lemma 2.10. For any e, 

P( E CAv7') = i\v<?)<n'-'^+^ 

o-GS<i 

Proof. Note that X](Tes< ("i"^) ^® ^ ^' depending on whether is in the set 
5'<i or not. No matter what w<i is, there are {2k— 1)^™ > n^~'^ choices for Vi, and 
each occurs with the uniform probability. The cardinality of S^i is at most i which 
is less than n, so the chance that is in S'<i is at most n^^^+'^, as claimed. □ 

It follows that if we fix a residue j mod Lm, for any e there are C > 1, c > 
such that we can estimate 

P( E E ^-(^r') > n^-'^+yLm) < OiC-') 

i—j mod Lmcr^S^i 

Summing over all residue classes j, and then replacing S'<i by Syi by symmetry 
proves the proposition. □ 

As remarked above, it is important for applications to show that the cardinality 
of S is very close to ri, with high probability. 

Proposition 2.11. Fix L > 1 and let S denote the set of subwords of v of length 
Lm. There is an e and C > 1, c > so that 

P{n - card{S) > n^") = 0(C~"') 
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Proof. The proof is almost the same as that of Proposition l2.6l except that we need 
to estimate the number of a for which some copy of a overlaps itself, and show this 
is < n^~'^ for some e with the desired probability. 

There are two kinds of overlaps to consider: those for which the nonoverlapping 
initial segment of the first word has length < 2to/3 ("big overlaps") and those 
for which it has length > 2m/ 3 ("little overlaps"). We count the number of each 
independently. 

A big overlap results in a subword of the form wuw where the length of w is 
at least m/6 and the length of u is at least m/6. Conditioned on w and u, the 
probability that the next word will be a copy of w is at most n~^/^, so there are 
at most n^/^ subwords of v that are contained in a big overlap. A little overlap 
results in a subword of the form ww where the length of w is at least 2to/3. Again, 
conditioned on w, the probability that the next word will be a copy of w is at most 
so there are at most n^/^ subwords of v that are contained in little overlaps. 
Each subword is contained in at most Lm — 0(log(n)) overlaps of either kind. The 
result follows. □ 

3. Stable commutator length 
The material in this section is standard. A basic reference is [8]. 

3.1. Definitions. 

Definition 3.1. Let G be a group, and [G, G] the commutator subgroup. The 
commutator length of an element g € [G, G], denoted cl{g), is the least number of 
commutators whose product is g; and the stable commutator length, denoted scl((;), 
is the limit scl((7) := lim„_>oo cl{g^)/n. 

The definition of (stable) commutator length can be extended to finite formal 
sums as follows; 

Definition 3.2. Let G be a group, and let {gi} be a finite collection of elements 
with Yiidi € [G, G]. Define cl(J2 9i) to be the minimum of cl(J|(7''') over all 
products of conjugates g^' of the gi. This is symmetric, and a class function in 
each gi separately. Define sc\{J29i) — limn^oo cl(X] 5")/"- 

Let Gi(G) be the real vector space with basis the elements of G, and let -Bi(G) be 
the kernel of Gi(G) — > Hi{G;M.). So -Bi(G) is the space of formal finite real linear 
combinations of elements in G that represent in (real) homology. Equivalently, 
Bi{G) is the image of the vector space of real 2-chains (in the bar complex) under 
d. It is a fact that scl extends by linearity and continuity to a pseudo-norm on 
Bi{G), and vanishes on the subspace {g — hgh~^,g" — ng). This vanishing refiects 
the homogeneity of scl and the fact that it is a class function in each variable 
separately. So scl descends to a pseudo-norm on the quotient (G) :~ Bi (G) /{g— 
hgh-^g"" - ng). 

The following theorem is nice to know, but is not used in an essential way in this 
paper: 

Theorem 3.3 (Calegari-Fujiwara 9 ). Let G he (word) hyperbolic. Then scl is a 
norm on B^{G). 
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3.2. Surfaces. Let X be a space with Tri{X) — G, and for any finite collection 
of conjugacy classes gi let T : Yii Sf ^ X he a 1-manifold in the associated free 
homotopy class. A map of a (compact, oriented) surface / : S* — > X is admissible if 
there is a commutative diagram 

dS > S 



9f 



f 



and an integer n{S) for which df^[dS] = n{S)\Wj^ S}] in Hi. The map is monotone 
if dS — > ]Jj S} is homotopic to an orientation-preserving cover (equivalently, if 
every component of dS wraps with positive degree around its image). 

Lemma 3.4 ([5], Prop. 2.74). Let gi, ■ ■ ■ , f;,„ be conjugacy classes in G, represented 
byV : W-Sl ^ X. Then 

where the infimum is taken over all surfaces S and all maps f : S ^ X admissible 

for r. 

The notation x^(S') means the sum of Euler characteristics x{Si) taken over 
those components Si of S with x{Si) < 0. By 8 , Prop. 2.13 it suffices to restrict 
to monotone admissible surfaces. An admissible surface S is extremal if equality is 
achieved. 



3.3. Fatgraphs. If F is free, X can be taken to be a graph, and any admissible 
surface can be represented combinatorially (possibly after performing some com- 
pressions) by a fatgraph. Fatgraphs are combinatorial objects which allow one to 
move back and forth between group theory/combinatorics and 2-dimcnsional topol- 
ogy; a standard reference is [26 , especially § 1. 

A fatgraph y is a graph together with a cyclic ordering of the edges incident at 
each vertex. Such a graph can be thickened to a compact surface S{Y) (or just S if 
Y is understood) in such a way that Y embeds in S{Y) as a deformation retract. A 
fatgraph Y is oriented if S{Y) is oriented. In the sequel we assume all our fatgraphs 
are oriented, and have no 1-valent vertices. Note that x(^) = x{S{Y)). 

A fatgraph over is a fatgraph with oriented edges labeled by words in F so that 
opposite sides get inverse labels, and the cyclic words obtained by reading around 
dS{Y) are reduced. By abuse of notation we write dY in place of dS{Y) and think 
of it as an element of {F) . Figure gives an example of an extremal fatgraph 
for the chain a + b + AB + [a, 6] in F2. Note that extremal surfaces do not need to 
be connected. 

The basic fact we use is the following lemma, which is a restatement of jl7j . 
Thm. 1.4 in the language of fatgraphs. 

Lemma 3.5 (Culler [T7], Thm. 1.4 (fatgraph lemma)). Let S be an admissible 
surface bounding a chain P. Then after possibly compressing S a finite number of 
times (thereby reducing ~x^{S) without changing dS) there is a fatgraph Y over 
F with S{Y) = S and dY = P. 
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Figure 2. An extremal surface for a + b + AB + [a, 6] represented 
as a fatgraph. 

Remark 3.6. Culler proves his theorem only for surfaces with connected boundary, 
but his argument generalizes with no extra work. An equivalent statement, valid 
for surfaces with disconnected boundary, is also proved in [5], Lem. 3.4; also see [8] 
§4.3 for a discussion and references. 

Let Y be an extremal fatgraph for v. The underlying fatgraph might not be 
trivalent, but by splitting higher valence vertices, and inserting (unlabeled) "dummy 
edges" , we can think of K as a trivalent fatgraph in a degenerate way, where some 
degenerate "edges" have length 0. We call this the operation of resolving vertices 
(such a resolution need not be unique). 

Lemma 3.7. Let Y be an extremal fatgraph for v, so that dY represents Nv for 
some N, and —xiY)/2N = scl(u). Resolve vertices of Y so that Y is trivalent, 
possibly with some edges of length 0. Let the average length of the edges of Y be 
im. Then 

scl(w) = nlog(2fc- l)/12^1og(n) 

Proof. Suppose Y has V vertices and E edges. Since Y is trivalent, 2E/3 = V and 
-X(r) = E-V = E/3. On the other hand, the total length of dY is Nn = 2E£m. 
Hence 

scl(w) = -xiY)/2N ^ E/6N ^ n/12£m = n\og{2k - l)/12£log(n) 

□ 

It will be our goal to show that for random v of length n 3> 1, the extremal 
fatgraph Y has £ — l/2 + o(l) with probability 1 — o(l). 
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Remark 3.8. The reader who is unhappy with edges of length can just take im 
to be equal to the total length of Y divided hy E + J2v valence('(;) — 3. 

4. Random values of scl 

The goal of this section is to prove the Random Rigidity Theorem: 

Theorem 4.1 (Random Rigidity Theorem). Let F be a free group of rank k, and 
let V be a random reduced element of length n, conditioned to lie in the commutator 
subgroup [F, F]. Then for any e > and C > \, 

|scl(w) log(n)/n - log(2fc - 1)/6| < e 

with probability 1 — 0{n^'-^). 

The proof will occupy most of the remainder of the section. 

4.1. Upper bounds. The upper bound in the Random Rigidity Theorem is sharp- 
ened by the following proposition: 

Proposition 4.2. Let v be a random reduced word in the commutator subgroup of 
length n. Then for any e > there are constants C > 1 and c > so that 

sc\{v) \og{n)/n ~ log(2fc — l)/6 < e 

with probability 1 — 0(C^^ ). 

Given random u, we explicitly build an extremal surface (actually an extremal 
fatgraph) by gluing together a very large number of tripods with edges of length 
slightly less than (1 — e)m/2. The fact that such tripods can be glued up to 
produce a fatgraph with boundary very close to a multiple of v follows from an 
equidistribution lemma, derived from the estimates in § [21 which holds with very 
high probability for most random v. The tripods do not glue up completely, but 
the mass of the unglued part has size 0(n^'^/^) compared to the glued part, and the 
remainder can be glued up (under the hypothesis that v is homologically trivial) 
with a contribution to x proportional to the mass. 

4.2. Tripods and joints. In what follows we generally adhere to the notational 
convention that group inverses are denoted by small and capital letters; hence X 
means x''^ and so on. 

Definition 4.3. A tripod of edge length L is a fatgraph with underlying graph a 
tripod, and with edges labeled by reduced words xY , yZ , zX where each of y, z 
(the incoming edge labels) has length L. We denote such a tripod T(x, y, z) 

A copy of T{x, y, z) is a triple of segments of the form xY , yZ ^ zX in v. These 
segments may appear anywhere in v] they might or might not be adjacent, and are 
allowed to overlap each other. 

Lemma 4.4. A triple x, y, z of reduced words of length L are the labels of a tripod 
if and only if their last letters are distinct. Consequently, for any reduced word xY 
of length 2L, there are {2k — 2){2k — 1)^^^ choices for z. 

Proof. Obvious. □ 

There are (2fc)(2fe - l)(2fc - 2){2k - /3 - {2k ~ tripods T of edge 

length L. For each tripod T, let dT denote the triple of words xY, yZ, zX. 
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Definition 4.5. A joint of edge length L is a fatgraph with underlying graph a 
segment, and with edges labeled by reduced words x, X each of length L. Denote 
such a joint J{x). 

A copy of J{x) is an ordered pair of segments of the form x, X in v. Again, these 
segments may appear anywhere in v (note that since v is reduced, these segments 
cannot overlap or be adjacent in v). We distinguish between orientations, so that 
J{x) and J{X) are different. 

Each joint J{x) is contained in a unique maximal joint J{x'). 

Fix L with L/m = 1/2 — e for some small e. For a word v, let Ti^iv) denote the 
set of copies of tripods of edge length L in v, and let Jl{v) denote the set of copies 
of joints of edge length L in v. Note that each pair of subwords x, X of length L 
in V determines two elements of Jl{v)- We define an involution t on the set Jl{v) 
interchanging such pairs. If v is understood, we just write and Jl- 

Given T, a copy of T(a;, y, z) of length L, there are three associated joints J{x), 
J{y), J{z) which can be extended uniquely to maximal joints J{x'), J{y') and J{z'). 
Note that x is a suffix of x', and so on. Define dT{x, y, z) — J{x') + J{y') + J{z') 
and extend 9 to a linear map from the space of measures on to the space of 
measures on J^. 

Example 4.6. Let v = ABBAbAABAAbababaabbABBBBabbaaB. The tripod of 
length 2 as indicated; 

ABB AbAA BAAbabab aabb AB BBBa bbaaB 

is associated to three joints: a pair Ab, Ba; a pair AA, aa; and a pair bb, BB. The 
joint Ab, Ba is contained in a maximal joint of length 5: 

ABBAb AABAAbababaabbABBB Babba aB 
and the joint aa, AA is contained in a maximal joint of length 4: 

ABBAb AABA Abab abaa bbABBBBabbaaB 
whereas the joint bb, BB of length 2 is already maximal: 

ABBAbAABAAbababaabbABBBBabbaaB 

The next lemma, although a simple consequence of the estimates in § 12. 4[ is 
key. It shows that with very high probability, the collection of all tripods of length 
(1/2 — e)m can be almost exactly glued up in pairs: 

Lemma 4.7. Let L/m = 1/2 — e. Then with probability 1 — 0(C~" ) there is an 
inequality |<9/i — idji] = 0{n^'^/^\ii\), where /i is the uniform measure on T^, and 
I • I denotes mass of a (possibly signed) measure. 

Proof. For any given J{x) in v contained in a maximal J{x') we estimate the number 
of tripods T(x',y,z) with J{x') in dT. First of all, y is determined, since the copy 
of X associated to J is the initial subword of some xY . Similarly, z is determined, 
since the copy of X associated to J is the terminal subword of some zX. Therefore 
the number of tripods is simply equal to the number of subwords of the form yZ 
in V. 

The number of copies of yZ in v is approximately n/\F2L\, i.e. about n"^ with 
an error of size 0{n'^^^~^^) for any S, by Proposition 12.31 Taking 6 = e/6 for 
concreteness, the error is at most ©(n^*^/^) which is a fraction 0(n~'/^) of the total 
mass. 



RANDOM RIGIDITY IN THE FREE GROUP 



13 



Since this is true for every joint J{x), the lemma follows. 



□ 



4.3. Proof of upper bound. The proof of Proposition l4.2[ is now straightforward: 

Proof. Assemble the tripods and glue them in pairs along their common boundary 
joints. By Lemma [4.71 all but 0(n~'^/^) of the measure of the set of tripods can 
be glued up this way, with probability 1 — 0(C~" ). This holds even conditioning 
on w G [F,F] with probability 1 — 0{C^" ), with slightly different constants, by 
Theorem mH 

This (partial) fatgraph Y can be extended (usually in many ways) to a complete 
fatgraph bounding some multiple of v in (F) so that the Euler characteristic of 
the added surface is proportional to the mass of the unglued part. We explain how 
to do this. 

Let N be the function on the letters of v whose value at a given letter is the 
number of edges of tripods that contain it, and let N' be the maximum of N. The 
function N' — N is therefore non- negative, and on the other hand maxA^' — N = 
0{n~'^^^)N' . We translate the problem of building a fatgraph that extends F as a 
problem of suitably gluing together a collection of rectangles. 

Each rectangle corresponds to some finite subword w oiv which we call the label 
of the rectangle. We think of the rectangle as having height 1 and width equal to 
the length of w. We keep track not only of w as a word in the generators, but 
also of where it appears as a subword of v. Color the top horizontal edge of the 
rectangle blue, and the vertical sides red. 



We want to glue together rectangles along segments of the boundary of integer 
length, blue to blue and red to red, so that two red edges may be glued only if 
the words associated to the rectangles are consecutive subwords of w, and two blue 
segments are glued only if the paired letters on either side are inverse in F. 

We take three rectangles for each copy of each tripod, with labels the subwords 
of V corresponding to the edges of the tripod. We also take N' — N rectangles for 
each letter of v, with label that letter. So we have lots of "long" rectangles — three 
for each tripod — and far fewer "short" rectangles (of length 1). By the definition 
of N' and TV, every letter of v appears as the rightmost letter of a label exactly 
as many times as the following letter of v appears as the leftmost letter of a label. 
So we could think of taking N' strips labeled v and cutting them into long and 
short rectangles; see figure [31 Naturally, it is possible to glue up the red segments 
in pairs compatibly. However, there are potentially many ways to do this, and it is 
important to glue up blue edges first, as we now explain. 

The long rectangles can be glued up along blue edges in threes to build fattened 
tripods. Pairs of tripods can then be glued up along red edges corresponding to 
joints (note that pairs of tripods are glued up in this manner along red segments of 



\bbbbaBAbAAB\aBaabbabbaB\A\ABBAABBAbab\M\ 





Figure 3. N' 



copies of V cut up into long and short rectangles 
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length 2). The result can be thought of in an obvious way as the partial fatgraph 
Y, where the blue edges are the core graph. See figure HI 




Figure 4. Long rectangles glued in threes along blue edges to 
make tripods, and tripods glued in pairs along red edges. 

Recall that by hypothesis v is homologically trivial, and note that the rectangles 
corresponding to a given tripod have the same number of copies of each generator 
as of its inverse. Consequently for each generator of F, there are as many short 
rectangles labeled with this generator as are labeled with its inverse. We can 
therefore glue together these short rectangles in pairs, so that every blue edge can 
be thus glued up. 

As observed above, the remaining unglued red segments can be glued up in pairs. 
We now perform this gluing (in an arbitrary way). See figure [5] Note that the result 
might have corners at which more than two paired red edges meet. 




Figure 5. Remaining red segments can be glued in pairs. This 
might produce red corners where more than two red edges meet. 
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The resulting surface Y' has no unglued red edges. The blue edges form the core 
of the surface, and the labels and the way the blue edges sit in the surface amounts 
to giving it the structure of a fatgraph over F whose boundary is a multiple of v. 
The fatgraph Y sits in Y' in an obvious way, and the contribution oi Y' — Y to 
— X is of order (N' — N)\v\, which is very small compared to the contribution from 
Y. In particular, the average edge length £m of Y' differs from the average edge 
length of Y by at most 0{n~'^/^), and therefore satisfies £ > 1/2 — e. The proof 
now follows from Lemma 13.71 □ 

Remark 4.8. The use of ergodic theory to construct an almost equidistributed col- 
lection of pieces with prescribed geometry that can be almost glued up is inspired 
by the techniques in Kahn-Markovic's recent proof [23) of the surface subgroup con- 
jecture in 3-manifold topology, and we are pleased to acknowledge our intellectual 
debt to this paper. 

4.4. Lower bounds. The goal of the next few sections is to prove the follow- 
ing estimate, which precisely complements Proposition 14.21 The Random Rigidity 
Theorem (i.e. Theorem 14. 1|) follows immediately from these two propositions. 

Proposition 4.9. Let v be a random reduced word in the commutator subgroup of 
length n. Then for any e > and any C , 

log(2fc - l)/6 - scl(w) log(n) /n < e 

with probability 1 — 0{n^'-^). 

Note that the probability estimate associated to the upper bound is exponen- 
tial, whereas the estimate associated to the lower bound is merely polynomial (of 
arbitrarily large degree). This disparity is an artifact of the method of proof. A 
worse lower bound, but with exponential bounds on the probability of deviation, is 
obtained in § [5] using the method of quasimorphisms. 

4.5. Combs. Let 6 be a subword of v, and consider some copy of b in the boundary 
of an extremal fatgraph Y for v. Recall that by our convention we artificially split 
open vertices of higher valence so that Y is trivalent, although it might have some 
edges of length 0. The subword b is contained in a segment a oiY , which is incident 
to a sequence of edges ei , 62 , • • • , of F in order. CaU the subgraph of Y consisting 
of the support of b together with the union of the a comb. 

Let Cl , • • • , be the labels on the edges (oriented to point in to a) . Further- 
more, the vertices of the subdivide b into subwords b^, - ■ ■ ,bj_, where we stress 
that some bi,Ci might have length 0. Then there are boundary labels of Y of the 
form BdCd, CdBd-iCd-i, • • ■ , C2B1C1, cii?o (see Figure[6l). By the definition of an 
extremal fatgraph, these boundary labels are (cyclic) subwords of v. 



Cl 


Cl C2 


C2 C3 


C3 C4 




Bo 


Bi 


B2 


B3 


Bi 


bo 


61 


62 


63 


bi 



Figure 6. A comb with edge labels. 



This suggests the following definition: 
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Definition 4.10. Given a word b C v a. comb on b is a family of subwords of v of 
the form BdCa, CdBd-iCd~i, • • • , ciBq. The complexity of the cone is d (as above) 
and the length is L, where |&| + ^ \ci\ = Lm. 

We would like to bound (in probability) the length of a comb in terms of its 
complexity. Fix a big constant L' , and let & C w be a subword of length L'm. 
We would like to construct a comb on b for which L/d is as big as possible. This 
amounts to choosing a partition of b into d successive subwords bi of length Li-m 
(where = is allowed), then choosing copies of Bi in w, and defining q to be the 
maximal subword following the copy of Bi for which Ci precedes the copy of -B^-i. 
Let these maximal Ci have length Kim. 

Note that the comb has length L — X^iLo + X^iLi ^^id complexity d. We 
would like to bound in probability the maximum ratio L/{2d+l), at least for typical 
b of some fixed length L'm where L' = J2 Li- 

By Proposition 12.31 and Proposition 12.61 there are almost exactly n^"^' possible 
locations of each Bi in u for < 1, and the chance that there is some Bi at all 
when Lj > 1 is at most n^^^'. If we assume that the prefixes and suffixes of the 
Bi of fixed length are evenly distributed, then for any fixed T, there should be an 
estimate 

P(J2K,>T + J2il-L,))^0in-^) 

If T is very big but fixed, and small compared to L' = ^ Li, then we can estimate 
L < T + {d+ 1), and therefore L/{2d +1) < 1/2 + e for any e with probability 
l-0(n-^). This is good enough to give the desired bound in Proposition 14.91 by 
Lemma 13.71 

Notice that this heuristic argument is almost rigorous: prefixes and suffixes of 
the Bi are not perfectly independent, but their correlation decays exponentially 
fast with the distance between Bi and Bj. Thus we need only examine the cases 
in which there are Ci^iBiCi and Cj+iBjCj that overlap. In order to obtain the 
desired estimate, it is necessary to make some a priori assumptions about a cone 
on b, which will turn out to be justified for most combs in any given extremal 
fatgraph Y. 

Definition 4.11. A subword 6 of f is 6 -regular if there is no subword b' of length 
(1 + 5)m such that B' is in w, and if all subwords of b of length 5 and their inverses 
are distinct. 

A comb on b is 5 -regular if b is (5-regular, and if all the Ci have length at most 
(1 + (5)m. 

Let w be a random word in F' of length n, and let Y be an extremal trivalent 
fatgraph for v (possibly with some edges of length 0). For any d, we can consider 
the set of combs of Y of complexity d. The following lemma justifies the definition 
of (5-regular: 

Lemma 4.12. Let v be a random word in F' of length n, and let Y be an extremal 
fatgraph for v. Then for any d, the proportion of combs ofY of complexity d that 
are not 5-regular is at most 0{n~^/'^), with probability 1 — 0{C^^^ ). 

Proof. By Proposition 12.61 with probability 1 — 0{C~^ ) there are at most n^~^/^ 
subwords of v of length (1 + 5)m whose inverse also appears in v (in fact, we could 
take any number < ^ in place of 5/2); hence the proportion of combs of complexity 
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d that contain an edge of length > (1 + 5)m is at most (M + 2)?!^*/^, since every 
edge of Y is contained in Ad + 2 combs of complexity d, and dY represents Nv for 
some N. 

An argument similar to Proposition 12.111 establishes that subwords of typical b 
of length 6 are distinct, with probability 0{C~^ ). □ 

4.6. Overlaps. We now restrict attention to a fixed (5-regular word 6, and consider 
a random word v conditioned to contain 6 as a subword. The arguments in this 
section depend on order-of- magnitude estimates of probability, expressed as a power 
of n. 

Fix vectors of lengths i^, i^T^ < (1 + 5)m, and for each choice of d locations in v, 
consider the probability that the subwords CiDi^iCi of length Ki + Li_i + Ki_i 
starting at these locations constitute a comb on &; we call such an occurrence a 
matching, and we want to estimate the probability of a matching at a given cZ-tuple 
of locations. We also refer to a vector of d locations in v as above as a configuration. 
If the subwords do not overlap, this probability is less than go it 

suffices to estimate the probability in the case that some subwords do overlap. This 
is somewhat fiddly, and depends on an analysis of the combinatorial possibilities 
for the overlap. However, the estimates in every case are entirely elementary. 

For each j > 2, let Pjm be the total length where at least j words overlap. 
Define the total overlap, counted with multiplicity, to be P := X]j>2 ^i- "^^^ total 
contribution to P from overlaps of Di with Dj will be 0{S), since b is (5-regular. If 
part of some (resp. Ci) is contained in an overlap, but the corresponding part 
of Ci (resp. Ci) is not, this overlap does not significantly affect the probability of a 
matching. If corresponding parts of Ci, Ci both overlap Dj, then again necessarily 
this overlap will be of size 0{S)m, since b is (5-regular. So to estimate the probability 
of a matching, it suffices to consider overlaps among the various Ci,ej. Let Pjrn 
be the total length where at least j such subwords overlap, and analogously define 

Lemma 4.13. With notation as above, the probability of a matching in a given 
configuration is at most f^-iJl Ki)+0(S) ^ 

Proofi An overlap in some subword of of length Im must correspond to an overlap 
in the corresponding subword of Ci to increase the probability of a match by at 
most ; so the increase over the "naive" probability of a match is at most a factor 
ofn^'/2. □ 

On the other hand, there are n'' sets of locations of the subwords, and for each 
given location of one subword, there are only 0(log(n)) locations of any other sub- 
word that overlaps it. Two subwords CiDi^iCi-i and ejDj^iCj-i can contribute 
at most 2(1 + S) to P' , precisely if Ci = ej and Ci_i = C^-i- We deduce the 
following lemma: 

Lemma 4.14. Let Li,Ki be some fixed vector of lengths with Li,Ki < 1 + 5, 
and define L = ^^Li + "^iKi. Suppose b is an 5-regular subword of v. Then 
the probability that there is a comb over b with the prescribed lengths is at most 
0(n-^+'^('')) where T = L - d-l. Consequently if L/{2d + 1) > 1/2 + e and S 
is sufficiently small compared to e, and d is sufficiently big compared to e, we can 
make T as big as desired. 
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Proof. As above, each set of locations has probability at most /2-l+o{S) ^j? ^ 
matching. Moreover, there are n'^ sets of locations, and at most n'^-^+oiS) gg^g q£ 
locations for which P' > 2r. The estimate follows. □ 

4.7. Proof of lower bound. We now give the proof of Proposition 14.91 

Proof. By Lemma l3.7i it suffices to show for every C and every e that the average 
length £m of the edges of an extremal fatgraph Y is at most 1/2 + e, with proba- 
bility 1 — 0{n~'~'). By Theorem 12.11 conditioning that v lies in [F,F] only affects 
probabilities by at most a factor of 0{n'^/'^). 

By Proposition 12.61 there are only 0(1) subwords of v of length > 2m and 
0{n^~^/'^) of length > (1 + 5)m, whose inverse also appears in w, with probability 
1 — 0(C^"^). So edges of length > (1 + 5)m affect H negligibly, and the fraction of 
combs containing such subwords are similarly negligible. 

Choose some very large constant d, roughly of size 0(l/e), and consider the set 
of all combs with complexity d in Y . Because Y is (formally) trivalent, every edge 
occurs in exactly (4c? + 2) such combs — each comb has 2d+ 1 edges, and each edge 
has two sides. By Lemma [4.121 if£> 1/2 + e, a definite fraction of these combs 
must be (5-regular, and satisfy L/(2c?+ 1) > 1/2 + e. 

On the other hand, by Lemma 14.141 for any (5-regular subword 6 and any given 
vector of lengths < l-\-5 the probability that there is a comb over h with prescribed 
lengths is at most 0{n~'^^'^'^^^) where T — L — d — 1. Since there are at most 
n possible locations in v for such a subword b, and since there are at most ((1 + 
S)m)^'^'^^ < vectors of lengths, the probability that there is any (5-regular comb 
with complexity d and length L is at most 0{n~^~^^~^^'^^^). So for any C and any 
e, if d is sufficiently large and L/{2d +1) > 1/2 + e, no such comb exists, with 
probability 1 — 0{n~^). The proof follows. □ 

Remark 4.15. A more careful analysis would almost certainly improve the estimate 
of the probability of a large negative deviation. The probability that a specific 
(^-regular subword is part of a (5-regular comb with big d and L/{2d + 1) > 1/2 + e 
is polynomial in n, and to violate the desired lower bound on scl we must construct 
a fatgraph containing a definite proportion of such big (5-regular combs. However, 
the events that distinct subwords b,b' are parts of such 5-regular combs are not 
obviously independent, and even estimating their correlation appears hard. Nev- 
ertheless, heuristically one would expect the true probability of a deviation to be 
exponential in (some power of) n. 

4.8. The Random Norm Theorem. In fact, it is not much more work to derive 
the following theorem, which specializes to Theorem 14.11 when d = I: 

Theorem 4.16 (Random Norm Theorem). Let F be a free group of rank k, and 
for fixed d, let vi,V2, ■•■ , Vd be independent random reduced elements of length 
711,712,- •• ,nd conditioned to lie in [F,F], where without loss of generality we as- 
sume ni > Hi for all i. Let V be the subspace of (F) spanned by the v-i. Then 
for any e > 0, C > 1 and real numbers ti, 

|scl(^ tiVi) \og{ni)/ni - log(2fc - 1)(^ \ti\ni)/Qni\ < e 
with probability 1 — 0{n^'~^). 
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We remark before giving the proof that even though the geometry of a (ran- 
dom) shoe of the unit ball is very simple, the finer polyhedral structure is apparently 
extremely complicated. Figure [7] and Figure [5] exhibit 2 and 3 dimensional slices of 
the scl unit ball of some relatively simple words. 




Figure 7. Unit ball in the scl norm in the subspace spanned by 
AbaBAbaBBAbaabaaBAAA and baaabAAbaabABAABBABa 



Proof. We give the proof in the case d = 2; the general case follows by essentially 
the same argument. For any pair of reduced words vi,V2 (not necessarily in [F, F]) 
choose a word z of length 6 contained in [F,F] so that Vizv2 is reduced. We can 
always find such a word z of the form xyXXYx for some generators a;, y so that vi 
does not end, and V2 does not begin, with X. 

This defines a map F|,„^| x F\y^\ i^|^,^|+|,„2|+6i and the pushforward of the 
product of uniform measures is proportional to the uniform measure on the image, 
with constant of proportionality independent of n. The relative proportion of the 
image is a constant, so by Theorem 14. II for any e > 0, C > 1 we have 

|scl(uizw2) log(rii + n2 + 6)/ (ni + ^2 + 6) - log(2fc - 1)/6| < e 

with probability 1 — 0{n^'^). For rii > n2 large, log(ni + n2 + 6) is very close to 
log(ni). On the other hand, |scl(ui + V2) — sc\{vizv2)\ < const. It follows for any 
e > 0, C > 1, with probability 1 - ©(nj"*^), 

|scl(wi + V2) — scl(ui) — scl(u2)| < ^n/ log(n) 

In particular, the boundary of the unit ball contains a point which is very close to 
the midpoint of the points i;i/scl(wi) and i'2/scl(u2), and by convexity, the unit ball 
in the positive quadrant of the wi,W2 plane is close to a triangle. Replacing Vi 
by , the entire unit ball in the wi, V2 plane is C° close to a diamond. The higher 
dimensional case is completely analogous. □ 
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Figure 8. Unit ball in the scl norm in the subspace spanned by 
aabAcAcBCC, bbcBaBaCAA and ccaCbCbABB 

5. QUASIMORPHISM LOWER BOUND 

In this section we exhibit an explicit quasimorphism which certifies a uniform 
lower bound for scl of a random word. Unfortunately, this lower bound is not sharp, 
for it exhibits only scl(w) > nlog(2fc — l)/121og(n) (with high probability), which 
is 1/2 of the correct value, by Theorem 14. II 

Experience shows that constructing explicit extremal quasimorphisms is difficult. 
For example, there is a polynomial time algorithm to produce an extremal surface 
for a chain in a free group, whereas there is no known algorithm (of any kind) 
to produce a certifying quasimorphism. Bjorklund-Hartnick [5] proved a central 
limit theorem for quasimorphisms (on random walks; but these are very similar 
to random words in the special case of free groups), and consequently any fixed 
quasimorphism on F takes values of order O(y^) on words of length n. For this 
reason, it is interesting to be able to construct an explicit quasimorphism which 
gives the correct 0(n/log(n)) order of magnitude. Another nice feature of the 
construction is that the bound in probability is exponential in n, in contrast to the 
polynomial bound in Proposition 14.91 

5.1. Quasimorphisms and Bavard Duality. A reference for the material in this 
section is especially Chapter 2. 

Definition 5.1. Let G be a group a quasimorphism is a function for which there 
is a least non-negative real number !?((/>) (called the defect) for which 

mgh) - Ha) - Hh)\ < D{(i)) 
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for all g,h £ G. 

Furthermore, a quasimorphism is homogeneous if (j){g^) — n(f){g) for all g e G 
and all integers n. 

If (j) is any quasimorphism, the homogenization of 0, denoted 0, is defined by 



It is a fact that </) is a homogeneous quasimorphism, and satisfies D{(j)) < 2Z?(0). 
See [5], Lemma 2.58. The set of homogeneous quasimorphisms on G is a real vector 
space Q{G). The subspace with D = Q consists precisely of the homomorphisms 
H^{G;^), and D makes the quotient Q/H^ into a Banach space. 

There is a duality between quasimorphisms and stable commutator length, known 
as Generalized Bavard Duality. The statement of this duality theorem is: 

Theorem 5.2 (Generalized Bavard Duality 8 , Thm. 2.79). Let G he a group. 
Then for any ^tigi £ B^ [G) there is an equality 



A special case of this theorem was established by Bavard in [1^ . Notice that this 
theorem is "complementary" to Lemma [3.4l an admissible surface certifies an upper 
bound for scl, whereas a homogeneous quasimorphism certifies a lower bound. 

An important and useful class of quasimorphisms are the (big) counting quasi- 
morphisms, defined by RhemtuUa [28], and rediscovered by Brooks [3]. Recall 
the definition of the counting functions Ca from § 12.31 and their antisymmetriza- 
tion Hu := Ca — G^-i. Given a set of reduced words S C F, the function 
Hs :— J2aes ^ quasimorphism, and its value on v counts the difference 

in the number of copies of a and of for each a G S. The homogenization 
counts the difference of the number of copies in the (cyclically reduced) cyclic word 

V. 

While big counting quasimorphisms are intuitively very natural, it will be tech- 
nically easier for us to work with small counting quasimorphisms. As above, let 
S C F, and define 



Then hs = cs — cg-i is a quasimorphism, the small counting quasimorphism on S. 
See e.g. [B] § 2.3.2. In contrast to big counting quasimorphisms, for which bound- 
ing the defect proves difficult, small counting quasimorphisms have a uniformly 
bounded defect. 

Lemma 5.3. For any S C F, we have D{hs) < 3 and D{hs) < 6. 

Proof. This is Lemma 5.1 from [11) . □ 

5.2. Construction of the quasimorphism. 

Proposition 5.4. Let v he a random reduced word in the commutator suhgroup of 
length n. Then there is an explicit construction of a homogeneous quasimorphism, 
so that for all e > there are constants G > 1 and c > such that with prohahility 
1 — 0{C^^^ ), the quasimorphism certifies the inequality 



(j){g) := lim (j){g'^)/n 




Cs ~ maximal number of disjoint copies of elements of 5' in v. 



sc\{v) > 



1 nlog(2fc-l) 
1 + e 12 log(n) 
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Proof. Recall our notation m ~ log(n)/ log(2fc — 1) where k is the rank of the free 
group F. Fix L = 1 + e for e > 0, and partition the cyclic word v into adjacent 
disjoint subwords of length Lm. Note that there may be some small remainder if 
Lm does not divide n; ignore this gap, as it will be insignificant for our purposes. 
Let S be the collection of these subwords. 

Lemma 5.5. For L — \ + e and S as above, there exist C > 1 and c > such that 
with probability 1 — 0{C^^^ ), there is a subset S' <Z S with 

card{S - S') < 

such that for no a £ S' does a^^ appear in v. 



Proof. Repeating the content of ii l2.5l while assuming that the words in S are disjoint 
only simplifies the arguments, so Proposition 12.61 still holds in this case. □ 

The certifying quasimorphism will be hs' . By construction, 

hs' («) > £^ - n^-'^+'Z^ - 1 - C(50-i (f ), 

and C(5/)-i(u) = by Lemma [5.51 By Lemma [5.31 D{hs') < 6, so Bavard duality 
gives 

. . hs'jv) . 1 »log(2fc-l) 

scl(u) > = — > z — — o n/ logm ). 

^ ^ - 2D{hs') - l + e 121og(n) ^ ' 

The statement of the lemma is obtained by repeating the argument with e/2; the 
multiplicative factor 1/(1 + e) then renders the o(n/log(n)) unnecessary. □ 

6. Computer experiments and a surprisingly good heuristic 

Recall that in the proof of Proposition 14.21 we constructed a surface by gluing 
random tripods. The length of the edges of the tripods was m/2 = log(n) /2 log(2A;— 
1), but each edge of each tripod was extended to a maximal joint before gluing. 
li u — xy and u' = xy' are reduced words with a common nonempty prefix x, 
the expected length of the common prefix of y is 1/(2A; — 1) + l/{2k — 1)^ + • • • = 
1/(2A: — 2). This suggests that the average edge length of an extremal surface should 
be at least m/2 + l/(2fc — 2), and therefore that the value of scl should be at most 
n/12(log(n)/21og(2fc - 1) + l/(2fc - 2))-i. 

Without a really sound theoretical justification, we nevertheless made the predic- 
tion that this heuristic correction should more accurately match the actual average 
value of scl, and tested this experimentally. 

Figure [5] displays the result of computer experiment. We computed the scl of 
20 random words in [-F'2j^2] of lengths between 70 and 240 (inclusive) in steps 
of 10. The upper solid line indicates the theoretical value nlog(2A; — l)/61og(n) 
from Theorem 14. 1[ the dots are the actual averages, and the lower dashed line 
(passing in a very satisfying way through the experimental dots!) is the heuristic 
n/12(log(n)/21og(2fc- 1) + l/(2fc- 2))-^. 

Appendix A. Directed graphs and Markov chains 

The purpose of this appendix is firstly to put the estimates obtained in § [2] 
into the more general context of the theory of nonreversible Markov chains, and 
secondly to indicate which aspects of the theory developed above can be expected 
to generalize easily to hyperbolic groups and spaces, and which aspects require new 
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Figure 9. Experimental computation of scl on random words in 
F2 of length between 70 and 240, and comparison with (asymp- 
totic) theoretical and heuristic values. 

ideas. The main results of the paper do not depend logically on the results or 
conjectures in this appendix. 

Let Xi be the directed graph whose vertices are the generators of F, and whose 
(directed) edges are the (ordered) non-inverse pairs. A random word v of length n 
can be interpreted as a random walk on X (where edges have the uniform proba- 
bility) starting at a random vertex (also with the uniform probability). This graph 
is ergodic (i.e. there is a directed path from any vertex to any other vertex) and 
aperiodic (i.e. the gcd of the lengths of the directed loops is 1). 

For any i let Xi be the directed graph whose vertices are the elements of Fi , and 
whose (directed) edges are the elements of i^i+i, where an edge g starts/ends at its 
prefix/suffix respectively of length i. Note for each i that Xi is {2k — l)-regular, 
ergodic and aperiodic. Again, a random word v of length n can be interpreted as a 
random walk on X of length n — i + 1 starting at a random vertex. 

Each Xi determines a nonreversible Markov chain (in the obvious way), with 
stationary probability tt the uniform probability measure on vertices (i.e. such that 
each vertex has weight l/(2fc)(2fc— 1)*~^), and Markov kernel Pi{x,y) — l/(2fc— 1) 
if there is a directed edge from x to y; i.e. if x and y are reduced words of length 
i, and the suffix of x of length z — 1 is equal to the prefix of y of length i — 1. 

For an introduction to the theory of Markov chains, see [19 . We remark that we 
use only the most elementary aspects of the theory in this paper, since our Markov 
chains always have discrete time and finite state space. 

A.l. Chernoff inequalities for nonreversible Markov chains. We would like 
to estimate the rate of convergence of random sums to the equilibrium; that is, we 
want to estimate the probability that \n~^ /(^j) ~ / /^""l is bigger than , 

for some function / on the vertices of Xi (i.e. on Fi). In the sequel we denote 
/ fdn by mean^(/), or just mean(/) if tt is understood. 

As is well-known, for reversible Markov chains, the rate of convergence is gov- 
erned by the spectral gap (i.e. the difference between 1 and the second largest 
eigenvalue) of the (symmetric) Markov kernel P. For nonreversible Markov chains, 
the relevant quantity is the smallest nonzero eigenvalue Ai of L Id — {P + P*)/2. 
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In general P* is defined by P*{x,y) — 7T{y)P{y,x)/TT{x), so in our context P* is 
just the transpose P^ . 

Let / be normalized to have ||/ — mean(/)||(x) < 1 and |j/ — mean(/)||2 < 1. 
Let q be an initial distribution, and define Nq = \\q/Tr\\2 (note that we always have 
Nq < niin(7r(ti))^^/^). Then the main ChernofF-type inequality, due to Lezaud, is 
as follows: 

Theorem A.l (Lezaud [5S], Thm. 1.1 (cf. Rmk. 1.3)). With notation as above, 
there is an inequality 

n 

P(n-i ^/(xj) - mean(/) > 7) < TV^e"^!"'''/® 

Remark A. 2. Replacing / by — / gives the same bound on P(?^~^ /(^j) ~ 

mean(/) < -7). 

Remark A. 3. It is possible to control the rate of convergence in terms of other 
kinds of spectral data, for instance, the second smallest eigenvalue Ai of Id — PP*. 
However for the Markov chains Xi as above with i > 2, the multiplicative reversibi- 
lization PP* has many distinct eigenvectors of eigenvalue 1, so Ai =0. Another 
approach is to work directly with the smallest positive singular value of the (non- 
symmetric) matrix Id — P; this approach is favored by Dinwoodie [18j . 

Remark A. 4. Lezaud's estimate is not in itself strong enough to derive Proposi- 
tion 12.31 because the variance of a counting function Ca is too big. Nevertheless, 
our proof of Proposition 12.31 owes something to the approach of Lezaud, and also to 
the earlier work of Dinwoodie I18j mentioned above (especially the implicit estimate 
of the random covering time in Lemma l2.4[) . 

A. 2. Estimating Ai. The following estimate on Ai in terms of the spectrum of P 
is obtained by Chung: 

Theorem A. 5 (Chung [14', Thm. 4.3). If X is a directed graph, the eigenvalue Ai 
of L is related to the (ordered) eigenvalues pi of P as follows: 

min(l - |p,|) < Ai < min(l - Re{p,)) 

Remark A. 6. Note that Chung proves her theorem for arbitrary (not necessarily 
regular) graphs, in which case the Laplacian L has the more complicated form 

$1/2 p$-i/2 _^ $-1/2 p*$l/2 

L = Id 

2 

where $ is the diagonal matrix whose entries are the values of tt. For a regular 
graph, $ is a scalar multiple of the identity and P* = P^ , so this simplifies to 
Id — (P -I- P"^)/2 which agrees with the definition of L in Theorem lA.il 

Lemma A. 7. For L = Id — {Pi + Pf)/2 where Pi is the probability matrix for Xi, 
there is an estimate Ai > const. > where const, does not depend on i. 

Proof. By Theorem IA.5[ it suffices to obtain upper bounds on the absolute values 
\pi\ of the spectrum of Pi. But the spectrum of Pi is equal to the spectrum of Pi 
for any i (padded by zeros), since the traces of all powers and P( are equal. To 
see this, observe that these traces count the number of periodic cycles in Xi and Xi 
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of period j, but such cycles in either case are in bijection with bi- infinite periodic 
words with period j. 

So it suffices to show that the spectrum of Pi has a unique eigenvalue 1 and 
all other eigenvalues strictly less than 1 in absolute value. This follows from the 
aperiodicity and ergodicity of Xi . □ 

Incidentally, Xi is a reversible Markov chain, and therefore the spectrum of Pi 
is real, so the same is true for the spectrum of all Pi. 

A. 3. Cheeger constants in Xi. There are other methods to estimate Ai for a 
directed graph, via a generalization of the classical Cheeger 's inequality. If X is a 
regular directed graph, the Cheeger constant h(X) is the infimum of |9J7|/|J7| over 
all subsets U of vertices of X with cardinality at most \U\ < \X\/2, where dU is 
the set of elements of the complement joined by a directed edge from U to If^. 
The significance of this quantity for Ai is the following theorem of Chung: 

Theorem A. 8 (Chung p^, Thm. 5.1). Let X be a directed graph. Then 

2h{X) > Ai > h^{X)/2 

For the sake of interest, we show that the Cheeger constants of the Xi are all 
equal, which gives another proof of Lemma IA.7I 

Lemma A. 9. For any i, there is an equality h{Xi) > h{Xi). 

Proof. We give a sketch of a proof. 

Given U a subset of Xi with \U\ < \Xi\/2, let V denote the set of suffixes of U 
of length i — 1, and let V' denote the set of words obtained from V by appending 
a letter. Then dU = V'\U. Also, let 'V denote the set of words obtained from V 
by prepending a letter. Then \V'\ = \'V\ = \V\{2k - 1) and U C 'V. Choose U so 
that 

\du\^\v'\u\ = hix,)\u\<hix,)\v'\ 

Note that either \V\ < \Xi^i\/2, or else we may obtain a lower bound on h{Xi) 
from the difference \V\ — |A'i_i|/2; for the sake of argument, therefore assume the 
former. 

Now think of F as a subset of Ai_i, and let W denote the set of suffixes of V 
of length i — 2, and define W' and 'W analogously to above. Then dV = by 
definition. Moreover, \W'\V\{2k ~ 1) = since each element of W'\V can 

be prepended with {2k — 1) different letters to produce an element of V'\'V. Since 
also \V\{2k - 1) = \V'\ > \U\ we deduce 

^ ^ - \V\ \V\ \V'\ - \u\ - \u\ ^ ' 

□ 

A. 4. Hyperbolic groups. For an introduction to hyperbolic groups, see Gromov 
|21) . A finitely generated group G is hyperbolic if it is coarsely negatively curved 
on a large scale. This can be expressed in several equivalent ways in terms of the 
geometry of the Cayley graph; the most useful characterizations are 

(1) (5-thinness of triangles; 

(2) a linear isoperimetric inequality; and 

(3) all asymptotic cones are M-trees. 
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The adjective "hyperbolic" comes from the close (metric) resemblance to hyperbolic 
geometry. But there is another sense in which such groups are hyperbolic, namely 
in the dynamics of the (symbolic) geodesic flow. 

Cannon showed |13] that in hyperbolic groups, a set of representative shortest 
words in any given generating set can be enumerated by a finite state automaton. 
In the language of digraphs, one version of Cannon's theorem can be expressed as 
follows. 

Let G be a hyperbolic group with a symmetric generating set S. Let F be a finite 
directed graph with a distinguished (initial) vertex, and edges labeled by elements 
of S, in such a way that there is at most one edge with a given label emanating 
from each vertex. A directed path 7 in F starting at the initial vertex determines a 
word w{'-f) in the generators S, and by evaluation, an element of G. Cannon shows 
that one can find such a F for which there is a 1-1 correspondence between such 
directed paths and elements of G, and moreover for which every word 1^(7) is a 
geodesic — i.e. it is of shortest length among all words in S* representing a given 
element of G. In more geometric terms, let F denote the universal cover of F (it 
is also a directed graph), and let F' be the subgraph of F which is the union of 
all directed rays starting at some lift of the initial vertex. Then F' embeds in the 
Cayley graph Gs(G) in an edge- label respecting way as a spanning tree, and every 
directed path in F' is a geodesic in Cs{G). 

In this language, there is a correspondence between "random" words in G, and 
"random" directed walks in F. One thinks of F as a topological Markov chain, 
and then one can assign probabilities to the edges (the transitions between states) 
in a way which maximizes the entropy. For such an assignment, the pushforward 
measure from walks of length n to the sphere of radius n in Cs{G) is coarsely 
equivalent to the uniform measure on the sphere, and the limit as n — )■ 00 converges 
to the Patterson-Sullivan measure on the Gromov boundary dG (see e.g. Coornaert- 
Papadopoulos 16j). 

A significant technical issue is that the graph F is not typically ergodic. Given a 
general directed graph F, one can form a new directed graph without cycles, whose 
vertices are the "communicating classes" of vertices in F (i.e. equivalence classes 
of the relation ~ where u ^ v ii there is a directed path from u to ?; and another 
directed path from v to u). Each vertex of the new graph corresponds to an ergodic 
subgraph of F, whose adjacency matrix has a real, non-negative (Perron-Frobenius) 
eigenvalue. 

From the point of view of probability theory, only the vertices with maximal 
eigenvalue are significant. It is an important consequence of a theorem of Coornaert 
[15] that for hyperbolic groups, such vertices do not occur in series, but only in 
parallel. It follows that this maximal eigenvalue A is also the growth rate of the 
group; i.e. the unique A such that there are 8(A") words of length n. The fact 
that such "maximal" vertices only occur in parallel means informally that there are 
finitely many distinct classes W so that all but 0{C~"' ) words of length n fall into 
one of the classes of W, and for words w in a given class Wi, for each a of length 
Llog(n)/log(A) with L < 1, there is some fi{(T) (depending only on a and on the 
class Wi) so that 
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i.e. the analogue of Proposition 12.31 holds for each class Wi separately, and with 
essentially the same proof. This leaves two problems before one can attempt to 
generalize the construction in §[l]to arbitrary hyperbolic groups: one must be able 
to compare fi{cr) for different classes i, and one must be able to compare /i(cr) with 
fi{a~^). These problems are largely solved by the methods of [9j[T0]; see especially 
[ID! § 3.7. 

We believe that it should be straightforward (albeit technically involved) to gen- 
eralize the results of §|3]to arbitrary hyperbolic groups, and therefore feel confident 
in the following conjecture: 

Conjecture A. 10. Let G be a hyperbolic group with finite generating set S, and 
let A be such that the number of elements of length n is 0(A"). Let v be a random 
element of word length n, conditioned to lie in the commutator subgroup [G, G] . 
Then for any e > and C > 1, 

\scl{v) log(n)/n - log(A)/6| < e 

with probability 1 — 0{n^^). 

A similar analogue of Theorem 14. 161 should also hold. 

A. 5. Hyperbolic manifolds. If M is a closed hyperbolic d-manifold, it makes 
sense to study the stable commutator length of random closed geodesies with length 
in [n — 5,n + 6] for some fixed S (conditioned to be homologically trivial) . The 
geodesic flow on a hyperbolic manifold is the canonical example of an Anosov flow, 
and the analogues of Lezaud's Chernoff-type inequality are the mixing theorems of 
Pollicott [27] and others. 

The correct analogue of log(A) should be the exponential growth rate of the 
number of orbits as a function of length which is just d—1 (i.e. the volume entropy) 
where d is the dimension. The following conjecture seems very reasonable: 

Conjecture A. 11. Let M be a closed hyperbolic d-manifold. Fix some S > 0. Let 
J be a random geodesic of length in [n — 6,n + 6] conditioned to be homologically 
trivial, and let v be the corresponding conjugacy class in tti (M) . Then for any e > 
and C > 1, 

\scl{v) log(n)/n - (d - 1)/6| < e 
with probability 1 — 0{n^'~'). 

If true, this conjecture would say that one can recover (to any desired accuracy) 
the length of a random geodesic directly from the bounded cohomology of 7ri(M); 
this interpretation is obviously very close to the spirit of Gromov's celebrated result 
discussed in the introduction. 
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